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Abstract. The classical problem of optimal transportation can 
be formulated as a linear optimization problem on a convex do- 
main: among all joint measures with fixed marginals find the op- 
timal one, where optimality is measured against a cost function. 
Here we consider a natural but largely unexplored variant of this 
problem by imposing a pointwise constraint on the joint (abso- 
lutely continuous) measures: among all joint densities with fixed 
marginals and which are dominated by a given density, find the 
optimal one. For this variant, we show local non-degeneracy of the 
cost function implies every minimizer is extremal in the convex set 
of competitors, hence unique. An appendix develops rudiments 
of a duality theory for this problem, which allows us to compute 
several suggestive examples. 



1. Introduction 

The optimal transportation problem of Monge |Mo81j and Kan- 
torovich |K42] has attracted much attention in recent years; see the 
surveys jAGll] |MG10] jV03] jV09] . However, there is a variant of the 
problem which is almost as natural but remains unexplored outside the 
discrete setting. This variant, tackled below, involves imposing capac- 
ity constraints which limit the amount transported between any given 
source and corresponding sink. 

Let LKW 1 ) denote the space of L 1 (R n )-functions with compact sup- 
port, where L 1 is with respect to Lebesgue measure. In this paper 
functions typically represent mass densities. Given densities < /, g G 
Ll(R d ) with same total mass J f = J g, let T(f,g) denote the set 
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of joint densities < h G L\ (M. d x M. d ) which have / and g as their 
marginals: f(x) = j Rd h(x, y)dy and g(y) = f Rd h(x,y)dx. The set 
T(f,g) is a convex set. 

A cost function c(x, y) represents the cost per unit mass for trans- 
porting material from x G M. d to y G R d . Given densities < /, g G 
Ll(R d ) with same total mass, and a cost c(x, y), the problem of optimal 
transportation is to minimize the transportation cost 

(1) I e (h) := c(x,y)h(x,y)dxdy 

JR d xR d 

among joint densities h in T(f,g), to obtain the optimal cost 

(2) inf Uh). 

In the context of transportation, a joint density h G T(f,g) can be 
thought of as representing a transportation plan. 

In this paper we will sometimes refer to the traditional optimal trans- 
portation problem as the unconstrained optimal transportation prob- 
lem. 

Given < h G L°°(R d x R d ) of compact support, we let T(f,g) K 
denote the set of all h G T(f,g) dominated by h, that is h < h almost 
everywhere. The set T(f,g) h is a convex set. 



The optimization problem we will be concerned with in this paper — 
the optimal transportation with capacity constraints — is to minimize the 
transportation cost §T§ among joint densities h in T(f,g) h , to obtain 
the optimal cost under the capacity constraint h 

(3) inf _I c (h). 

Interpretation. As an example of an optimal transportation problem 
in the discrete case [V09[ Chapter 3], consider a large number of bak- 
eries producing loaves of bread that should be transported (by donkeys) 
to cafes. The problem is to find where each unit of bread should go 
so as to minimize the transportation cost. The unconstrained optimal 
transportation problem assumes ideal donkeys that can transport any 
amount of bread. The constrained version discussed here takes into 
account the capacity limitations of the donkeys — assuming of course 
that each (cafe, bakery) pair has a donkey at its disposal, and that no 
donkey services more than one cafe and one bakery. 
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Example 1.1. (Constrained optimal solution concentrates on 'diagonal 
tiles in a 2 x 2 checker board' in response to an integer constraint.) 
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(a) h (b) h 



Figure 1 

Let I be the closed interval [— |, |] C M. 1 and let f = g = lj have 
constant density 1 on I (here 1/ is the characteristic function of the 
set I). Let h = 2 • lp have constant density 2 on I 2 (figure IB). Note 
that < /, g e L\(I) have same total mass 1, and that T(f,g) h ^ 
since it contains lp. Let c(x, y) = ||x — y\ 2 . Then, as explained in the 

appendix, J c (-) attains its minimal value on T(f,g) h at (see figure 1A) 



(4) h(x,y):= 




on [_i )0 ]x[-|,0]U[0,i]x[0,|] 
otherwise. 



Other examples can be derived from this one (see Remark 15. 3j) . Lest 
such examples seem obvious, we also pose the following open problem: 



Example 1.2. (Open problem.) 

Let I, f,g and c be as in example \l.l\ Let h = 4 • lp have constant 
density 4 on I 2 (figure 2B). After considering example \l.l\ it is natural 
to guess that I c (-) attains its minimal value on T(f,g) h at (see figure 
2A) 

4 on S 
otherwise, 



(5) h(x,y):-- 



where S := [-|, -\\ x [-§, -\\ U [-\, 0] x [-\, 0] U [0, ±] x [0, \] U 
[I'll x Surprisingly, this is not the case. The perturbation Ah 

in figure 2C reduces the total cost of h. Here '+ ' represents adding 8 
mass and '— ' subtracting 5 mass. Since adding/subtracting mass near 
the diagonal has negligible cost the net contribution of Ah is domi- 
nated by the four minuses near the four points (— |, 0), (0, — |), (0, 4) 
and (|,0). So Ah strictly reduces the total cost of h. We don't know 
the true optimizer for this example. 
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(A) h 



(B) h 
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(c) Ah 
Figure 2 

Example 1.3. (Constrained optimal solution with respect to periodic 
cost concentrates on 'diagonal strip'.) 




(A) R> 



(B) R 



Figure 3 



Let M 2 /Z 2 be the periodic unit square, that is M? where (x,y) is 
identified with (x f , y') whenever x — x',y — y' G Z, and put the periodic 
cost function c(x,y) = inf\x — y — n\ 2 on it. Two fundamental domains 

are R (see figure SB), and R' (see figure 3A). 

The coordinate change x' := y+x, y' := y — x maps R bijectively onto 
a square of side-length v2, which can be identified with R' . The cost 
becomes c(x',y') = inf\y' — n\ 2 , which on R' is just c(x',y') = y' 2 . Note 

that in the x', y' coordinates, the cost is constant along lines parallel 
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to x' . Given total mass 1 and constant capacity bound h > 1 on the 
periodic square, let 



centered about the diagonal x' (see shaded strip in figure 3 A). From the 
simple form of the cost in the x',y' coordinates it can be easily seen that 
ho is the optimal way to fit mass 1 into R' while respecting the bound 
h: ho = argmin j RI ch. In particular h$ = argmin j RI ch, where 1/ is 



equal to the marginals of h . As a function on R, h is supported on 
the shaded region in figure SB. 

Note that the uniqueness result, Theorem \8.1l still applies to this 
cost, since it is C 2 and non- degenerate outside of two diagonal line 
segments on the periodic square. 

Motivation. The thing to note from example 11.11 is that at almost 
every point of the underlying space, the density h of the optimal solu- 
tion, is either equal to or to h, the density of the capacity bound. In 
the language developed below h is geometrically extreme. 

This example is special since the densities involved are both locally 
constant. It is easy to see that when h and h are both constant in 
a neighbourhood of a point (xo,yo), M x o,2/o) must either equal or 
h(x ,y ): if < h(xo,y ) < h(x ,y ) then a standard perturbation 
argument (see proof of Lemma I6.ip shows that h cannot be optimal. 

In general h and h are not locally constant. But, one of the main 
insights we exploit in this paper is that, at an infinitesimal level they 
become constant: if we blow-up h and h at a (Lebesgue) point, the 
blow-ups have constant densities (see (b) of Claim I4.2p . In effect, 
blowing-up allows us to reduce the general case to the special case 
of locally constant densities, as is the case in example 11.11 

Main result: Existence and Uniqueness. Proving solutions to the 
capacity-constrained problem exist (Theorem I3.ip requires very minor 
modifications of the direct argument familiar from the unconstrained 
case. The main result of this paper is therefore the uniqueness theorem 
(Theorem 18. II) . It says that under mild assumptions on the cost and ca- 
pacity bound, a solution to the capacity-constrained problem is unique. 




h<h 

mass(h) — l 



her(i I ,i I )>' 



Strategy. Recall that a point of a convex set T is called an ex- 
treme point if it is not an interior point of any line segment lying in 



6 



JONATHAN KORMAN AND ROBERT J. MCCANN 



T. A density h in T(f,g) h will be called geometrically extreme (see 
Definition 16. 2j) if there exists a (measurable) set W C R 2d such that 
h(x, y) = h(x, y)lw(x, y) for almost every (x, y) G 1R M . (Such a density 
might be called 'bang-bang' in the optimal control context). Observe 
that a density is an extreme point of T(f,g) h if and only if it is geo- 
metrically extreme (with respect to h). 

It is well-known in the theory of linear programming that every con- 
tinuous linear functional on a compact convex set attains its minimum 
at an extreme point. Our strategy for proving uniqueness in the prob- 
lem at hand (Theorem 18.11) will be to show that every optimizer is 
geometrically extreme (Theorem I7.2p . hence is an extreme point of 
T(f,g) h . Since any convex combination of optimizers is again optimal 
(but fails to be geometrically extreme), it follows that no more than 
one optimizer exists. 

Remark. Once a solution is known to be geometrically extreme, the 
entire problem is reduced to identifying the geometry of its support W. 
Example 11.11 shows the boundary of W cannot generally be expected to 
be smooth. It is natural to wonder how to characterize W, and what 
kind of geometric and analytic properties dW will generally possess. 

Main assumptions. The main two assumptions for the uniqueness 
result are that the capacity constraint h is uniformly bounded, and that 
the cost c(x, y) is non-degenerate (in the sense that detDl y c(x, y) ^ 
in equation (J7])). Sufficiency of a local condition for uniqueness is some- 
what of a surprise; c.f. the cylindrical example of [MPW10, p. 10], which 
suggests that — except in one dimension — no local hypothesis on the 
cost function is sufficient to guarantee uniqueness of minimizer in the 
unconstrained case. 

Remark. Although capacity constraints are quite standard in the 
discrete case, they do not seem to have been much considered in the 
continuum setting. On the other hand, the work of Brenier [B87J[B91J 
marks a turning point in our understanding of unconstrained trans- 
portation in the continuum setting, and we were surprised to discover 
that many of the insights gained in that context do not seem to adapt 
easily to the capacity-constrained problem. 
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2. Notation, Conventions, and assumptions 

For a differentiable map T : IR n — > M. m let DT denote the de- 
rivative of T, that is the Jacobian matrix of all partial derivatives 



Let D 2 c(x,y) denote the Hessian of c at the point (x,y), that is the 
2d x 2d matrix of second order partial derivatives of the function c at 
(x,y). Let D 2 c(x,y) denote the d x d matrix of mixed second order 



The n-dimensional Lebesgue measure on R n will be denoted by C n . 

Let Tlx ■ ^ d x R d — > R d : (x, y) H> x and tt y : R d x R d — > M. d : 
(x, y^j i — y y be the canonical projections. For a density function h G 
L 1 (M d x M d ) denote its marginals by hx and h Y : hx(x) := f Rd h(x, y)dy 
and h Y (y) := f Rd h(x,y)dx. 

2.1. Assumptions on the cost. Consider the following assumptions 
on the cost: 

(CI) c(x,y) is bounded, 

(C2) there is a Lebesgue negligible closed set Z C M. d x IR d such that 

c(x,y) e C 2 (R d xM d \Z) and, 
(C3) c(x, y) is non-degenerate: detDl y c(x, y) ^ for all (x, y) G 

l J xl d \ Z. 

2.2. Assumptions on the capacity constraint. In section [3] and 
from section \5\ onwards, we will always assume that h is measurable and 
non-negative, has compact support, and is bounded on M. d x W d = M? d . 

Given marginal densities < /, g G L^(M ) with same total mass, 
to avoid talking about the trivial case, we will always assume that a 
feasible solution exists: T(f,g) h ^ 0. 





l<i,j<d 



. Note that 
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Remark 2.1. To guarantee that the transportation cost I c (h) is finite 
we require h to have compact support: since the cost c is always assumed 
continuous and bounded, h having compact support makes sure that 
Ic{h) < I c (h) < oo for all h < h. Note that when h has compact 
support, so will any density in T(f,g) h , as well as f and g. 

3. Existence 

For simplicity we prove existence only in the case when h has compact 
support. 

Theorem 3.1. (Existence) Assume that the cost c is continuous and 
bounded. Take < h G L°°(IR d x W. d ) of compact support and let 

< /, g G Ll(M. d ) be marginal densities for which T(f,g) h 0. Then 
the corresponding problem of optimal transportation with capacity con- 
straints (T3j) has a solution. That is, J c (-) attains its minimum value on 

r(/,#. 

Proof. Let X,Y G M. d be compact subsets such that spt(h) G X x Y . 
Note that the support of any h G T(f,g) h is also contained inlxF, 
and that spt(f) G X, spt(g) G Y. 

Since h is bounded and has compact support, h G LPiX x Y) for all 

1 < p < oo, in particular h G L 2 (X x Y). Consequently T(f,g) h C 
L 2 (X xY). 

We shall now specify a topology on L?[X x Y) for which r(/, g) h is 
compact and J c (-) continuous. Existence then follows from the general 
fact that a continuous function attains it minimum on a compact set. 
For X and Y compact, it is convenient to use the weak-* topology, as 
in the unconstrained transportation problem (e.g. [VP 3] ). Since L 2 is 
reflexive, the weak-* topology is the same as the weak topology. For 
the sake of completeness, we outline the direct argument despite its 
standard nature. 

Give L 2 (X x Y) the weak topology. By the Banach-Alaoglu Theorem 
any closed ball B r (0) of radius r < oo in L 2 (X x Y) is weak-*, hence 
weak, compact. Note that any h with < h < h satisfies \\h\\2 < 
\\h\\2 =: R < oo. Hence T(g, h) h is contained in B R (0). So in order to 
show that T(f,g) h is compact, it is enough to show that it is closed. 

Let h n be a sequence in T(f,g) h which converges weakly to G 
Br(0). We want to show G T(f,g) h , that is that is dominated 
by h almost everywhere and has / and g as marginals. 
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Weak convergence means that for all if) G L 2 (X x Y), 



n— >oo 



lim / h n (x,y)if)(x,y) = / h 00 {x 1 y)^{x,y). 

'XxY JXxY 



Since h n < h, j h n if> < J hif) for all non-negative if) G L 2 . Letting 
n — > oo, J hooip < j hip for all non-negative if) G L 2 , hence hoc < h 
almost everywhere. 

It is easy to see that (h^x = f by using the definition of weak 
convergence (jHJ) with ift(x,y) := ip(x)lY(y), where if) G L 2 (X). A 
similar calculation shows that (/i DO )y = g. It follows that T(f,g) h is 
weakly closed. 

To see I c (-) : T(f,g) h — )■ M is continuous with respect to the weak 
topology use equation ([8]) with if)(x,y) := c(x, y)lx x y(a ; , y), which is 
in L 2 {X x Y) since c is assumed bounded, to conclude that 

I c (h 0O )= / h 00 c= lim / ft, n c= lim I c (h n ). 
Existence in the constrained case follows. 

□ 



4. Blowing up a density near a Lebesgue point 

When < h is dominated by h G L°° it is also bounded. Even when 
h is continuous, h G L 1 may not be continuous as we have seen in ex- 
ample [LTJ however it is necessarily measurable, belonging to L 1 . The 
notion of a Lebesgue point is a substitute for the notion of a point of 
continuity in the measure theoretic context. In this section we study 
the behaviour of h near its Lebesgue points. 

Given a Lebesgue point (x , y ) G R d x R d of < k G L l c {R d x R d ), 
consider the constant function k^x, y) := k(x , y ) defined on the unit 
cube Q := [— |, \} d x [— |, We call fcoo the blow-up of at (x ,Z/o)- 

Let Q n = Q n (x , j/o) : = Oo, Vo) + j^f x denote small 

cubical neighbourhoods of volume centered at (x , 2/o) G M d x M d . 

Let (p n : Q ^ Q n C R d xR d be given by ^ n (x, y) = (x , y ) + fa, v)- 
Let fc n : Q — > R be defined by 

(9) k n := ko ip n . 



It will follow from claim 14.21 that k n converges to k^ strongly in 
L\Q). 
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Definition 4.1. We call k n the blow-up sequence of k at (xo,yo). We 
call its limit k^, the blow-up of k at (xo,yo). 

We recall some basic facts about Lebesgue points from [Ru87j . 
Let / G L l (R n ). Any iGl" for which it is true that 

is called a Lebesgue point of /. Here B r (x) denotes the open ball with 
center x and radius r > 0. At a Lebesgue point L -function / 

has a well defined value: 

fix) = Urn—— — — - [ f(y)dy. 

Here {R r (x)} is any sequence of sets which 'shrink nicely' to x (e.g. 
cubes, spheres). 

If x is a point of continuity of / then X IS cL Lebesgue point of /. In 
particular, for a continuous function, every point is a Lebesgue point. 
Given / G L 1 (M n ), Lebesgue's Theorem says that almost every point 
in M n is a Lebesgue point of /. 

Claim 4.2. Let (x ,y ) be a Lebesgue point of < k G Ll(R d x M d ). 
Let k n denote the blow-up sequence of k at (xo,yo) and let k^ denote 
the blow-up of k at (xo,yo). Then: 

(a) k n ^- /coo strongly in L 1 (Q), i.e. \\k n - /coo||li(Q) ~^ °> 

(b) k n (x, y) = k{\x + x , \y + y ) on Q. 

Proof, (a) Letting ip n denote the dilation from flU]) yields 

/ \K - k^dxdy = / \(k-k(x ,yo))o<p n \dxdy 
JQ Jo. 

\(k - k(x ,y )) o <p n \\detD<p n \dxdy 



1 



C 2d [Qn] 
1 



equality uses \detDip n (x, y)\ 



\(k(x,y) - k(x ,y ))\dxdy -> 0, 

on of k n , and the seco 
43 = C 2d [Q n }. The last 



C 2d [Qn] 

as n — > oo. The first equality is the definition of k n , and the second 

£ o 

n a 

o \ 

equality follows from the change of variable formula and the limit at 
the end follows from (xo,yo) being a Lebesgue point of k. 

(b) follows immediately from the definition of ip n . □ 

For later use we record the following immediate consequence of above 
claim. 
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Remark 4.3. LetO < h E L°°(E xl d ) have compact support. Suppose 
that < h < h and that (xo,yo) is a common Lebesgue point of h and 
h. Then, letting h n and h n denote the blow-up sequences of h and h at 

(a) hoo(x,y) = h(x ,y ) on Q and \\h n - /ioo||li(Q) -> 0, and 

(b) 7ioo(x,y) = h(x ,y ) on Q and \ \h n - h^] | L i (Q) ->■ 0. 

The following proposition clarifies the nature of convergence of h n 
on Q. It says that (for a subsequence n{i)) Q can be partitioned into 
a 'good' set, F n , and a 'bad' set, E n . On the good sets h n converges 
'uniformly' to h(x ,yo) while on the bad sets it is uniformly bounded; 
the good sets are large and the bad are small. Recall that the function 
h is assumed to be bounded and that Qi is compact. 

Proposition 4.4. Let < h G L 1 n L°°(lR d x R d ). Suppose that 
< h < h almost everywhere, and let h n denote the blow-up sequence 
of h at a Lebesgue point (xo,yo). For some subsequence indexed by 
n E Nq = {m < ri2 < ■ • • } there exist non-negative real numbers 
a n — > 0, and Borel subsets E n and F n := Q \ E n of Q, such that 

(a) < C 2d [E n ] < a n , 

(b) \\h n (x,y) - h(x ,y )\\ L oo { F n) < «„, 

(c) \h n (x,y)\ < ||/i|| L oo {Ql) for almost every (x,y) G Q. 

Proof of (a) -(b). By Remark 14 .3[ hk —> = h(xo,yo) strongly in 
L l (Q), i.e. \\hk — hooW&iQ) ~ * ®- ^ f°U°ws that a subsequence con- 
verges pointwise to hoc almost everywhere on Q; for example, choosing 
H^fci — hk i+1 1 |l!(q) < is known to assure this |LL01l Theorem 2.7]. 
By Egoroff 's Theorem, for any natural number m, there exists an open 
subset E m C Q such that < C 2d [E m ] < ± and \ \h ki — /iooll^^) -> 
as i — > oo, where F m := Q\ E m . Hence, for i — i m large enough, 

3 (-Fm) m 

Note that without loss of generality we can assume that ^ ^ 
k{ 3 < ■ • • . Let iVo := {k im \m G N}. Relabeling indices by n G N : 
n := k im , E n := E m , and letting a n := — , the above equation becomes, 
for all n G N , 

(11) \\hn ^-oo | lz,°°(j? n ) ^ ®-n- 

Proof of (c). For almost every (x, y) G Q and all n G N we have by 
(b) of Claimg21 h n (x,y) = h(±x+x , ^y + y ) < h(±x+x , ^y+y ) < 

L°°(Qi)- 1=1 



( 10 ) W h h m ~ h oo\\ L ^ ( B) < —■ 
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We also need a similar but more delicate result concerning conver- 

1 lid v [_1 lid 
2 ' 2 J L 2' 2J ' 



gence of the marginals of /i n . Recall that Q = [—h, V\ d x ^~ 1 1 



Proposition 4.5. Le£ /i n 6e i/ie blow-up sequence of < h G L 1 fl 
L°°(M d x ]R d ) at a Lebesgue point (xo, yo) and let f n := (h n )x and g n : = 
(h n )y be the corresponding marginals. Taking N (see Proposition^.^ 



smaller if necessary yields a further subsequence indexed by n G No, 
with Borel subsets X^ ad ,Y* ad C [-§, \} d and X9° od := [-|, \} d \ X b n ad 
and Y9° od := ±] d \ y„ fead suc/i tfcaf, 

(aj lim £ d [X£ ad ] = and lim C d [Y* ad ] = 0; 

(b) lim H/n-M^o^oJIlLoofx^ = = lim \\g n -h{xo,yo)\\ L00 ^ B ood V 



(c) If < h < h as in Proposition Jf_Jf_, then f n < ||^||z,°°(Qi) and 

d 

2' 2J ■ 



9n < ||/l|U»(Oi) » n F ' "' / 



Proof of (a)-(b). Let us start with the subsequence ft, n from Proposi- 
tion|01 Its marginals /„ and are given by f n (x) := / r _i i ld /i n (x, y)c/y 

and /oo(a:) := f r _i ild h^tx, y)dy = h(x , y ). The marginals g n and g^ 

I 2'2J 

are defined similarly. 

By (a) of Claim I4T21 — /ioollz^Q) — * 0. It follows that \\f k — 

/oo|Li([-|,i]d) -> and - PoolliiQ.i^jd) -> 0. Let f ki and fc . 

be subsequences satisfying \\f k . - f k . +1 | | L i {[ _ i^ ]d) < ± and \\g k . - 

9k i+1 \\m[-i,^) < 51 ■ 

As in the proof of Proposition 14.41 Theorem 2.7 of [LL01] and Ego- 
roff's Theorem imply existence of open subsets X^ d ,Y^ d C [— 
(m G N) satisfying < C 2d [X^ d ], C 2d [Y^ ad ] < i, such that for i = i m 

large enough, ||/ fc . m - /coll^p^, - gJl^yr" 1 ) < % 

relabeling indices, as in the proof of Proposition 14. 4[ we get for all 
n in an index set N Q : < £ 2d [X£ ad ], £ M [F n 6ad ] < a n and ||/ n - 

/oo I \ L °°{x 3 n ° oi )-> \\9n- goo\\ L °°(vB° od ) < a n- (a) and (b) follow since a n -> 
as n — >■ 00. 

Proof of (c). Follows immediately from (c) of Proposition 14.41 and 
the formula f n {x) = Jj_i iid ^(^j D 



5. Optimality is inherited by blow-up sequence 

When h is optimal among densities which share its marginals and 
which are dominated by h, i.e. h G argmin I c (k), we show that h n is 

(almost) optimal among densities which share its marginals and which 
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are dominated by h n , i.e. h n G argmin h(k). 

fcer((h n ) x ,(h„)y)^ 

We first record what conditions (C2) — (C3) of subsection 12.11 on 
the cost imply about the Taylor expansion of c. Suppose the first and 
second derivatives of c(x, y) exist at (xo, yo) and consider the 2 nd -order 
Taylor expansion of c near (xo,yo)'- 

d r% d r% 

, x y . . . ^ — \ ac . , Xj , ac , . j/j 

c(x + -,?/o + -) = c(x ,2/o) + > , ^— (^o,2/o)— + > v ^— (^o,2/o) — 
n n z — ' axj n ' ay* n 

i=i 1 i=i yi 

( 12 ) + ^{^ T Z)^c(so, 2/0)^ + ^^/(^0,1/0)2/ 

+ x T ^c(x ,2/o)!/ + -R2(-,-)}. 

y n n 

Here i?2 is ^ 2 times the 2 nd -order Lagrange remainder -) which 

satisfies ||-R 2 (f , f )IU°=(Q) = o(^) (e.g. see |Sp80[ Theorem 19.1] for the 
1-dimensional case). Hence ||i?2(~j ^)IU°°(Q) = o(l) as n — * 00 • 

When Dl y c(xo, yo) is non-degenerate, changing the y coordinates by 
y new = Dl y c(x , y )y o id gives, without loss of generality, that D 2 xy c{x , y ) = 
I. Hence without loss of generality we can assume that x T Dl y c(x , yo)y, 
the mixed 2 nd -order term in equation (fl2"|) . is equal to c(x,y) :— x ■ y. 
In other words, after an appropriate change of coordinates equation 
( I12p assumes the form: 

x y 

(13) c(xq H — , 2/0 H — ) = constant term + terms involving x alone 

n n 

+ terms involving y alone H — ^{c(x, y) + i?2( — , — )}• 

rr n n 

For c n (x,?/) := c(x,y) + ^(f , f) let /?„(£;) denote f Q c n (x,y)k(x,y), 
and for c(x,y) = x ■ y let ic(fc) denote j Q c(x,y)k(x,y). Note that 

= I Q ck+J Q R 2 k and | j Q R 2 k\ < J Q \R 2 \\k\ < \\R2\\L°°(Q)\\k\\Li(Q)- 
Hence given a fixed constant M > 0, we have for all G L l (Q) whose 
total mass ||A;||z,i(Q) < M, 

(14) 7^)= /«(*)+ o(l). 

Remark 5.1. Note that when the cost satisfies (C2) — (C3) of subs ec- 
tion \2.1[ for every (xq, yo) G x i/ie first and second derivatives 

ofc(x,y) exist at (x ,y ) and Dl y c(x ,y ) is non-degenerate. 

In |V09t Theorem 4.6] it is shown that unconstrained optimality is 
inherited by restriction to (measurable) subsets: if the restricted plan 
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is not optimal, then it can be improved, but any improvement in the 
restricted plan carries over to an improvement in the original optimal 
plan, which is not possible. In the constrained context, optimality is 
not necessarily inherited by an arbitrary restriction. To see this, recall 
example 11.11 where the optimal constrained solution is given by h in 
equation Note that the restriction of h to [0, \] x [~, §]U[~, ~] x [0, ~] 
is not optimal: restricting h to [0, |] x [0, |] U [|, ~] x [~, |] has the same 
marginals but lower cost. 

The following lemma says that in the constrained case, optimality is 
inherited when the restriction is to a rectangular set. This is used in 
the proof of Proposition 15.41 

Lemma 5.2. Let < h G Ll(R d x R d ) be optimal among densities in 
T(hx,h,Y) h with respect to a cost function c. Consider a rectangular 
neighbourhood A x B C M. d x M. d where A and B are Borel subsets of 
R d , and let h denote h\AxB, the restriction of h to Ax B. Then h is 
optimal among densities in T(hx, hy) h with respect to the same cost c. 

Proof. If h is not optimal, then there exists a plan h! G Yihxihy) h 
improving h. Note that h and h! are both supported on the rectangular 
neighbourhood Ax B. Now consider the plan h — h + h' which improves 
h. Since h and h! have the same marginals, h — h + h' G T(hx, h Y ). 
Note that 



and that h — h + h! < h. It follows that the improved plan h — h + h' G 



Remark 5.3. By the above lemma, restricting the optimal density of 
example li.il to rectangular sets gives more examples of optimal densi- 
ties. 

Proposition 5.4. Let the cost c(x,y) satisfy conditions (CI) — (C3) 
of subsection lff.il Let < h G L°°(IR d x M. d ) have compact support 
and suppose that T(f,g) h ^ 0. Make a linear change of coordinates 
if necessary so that (TT3|) holds. Take h G T(f,g) h and let (xo,yo) G 
]R d x M. d \ Z be a Lebesgue point of h. Consider the blow-up sequence h n 
of h at (xo,y ). Then h c-optimal implies that h n is c n -optimal, where 




on Ax B 
otherwise, 



T(hx, hy) h , contradicting optimality of h. 



□ 



c n (x,y) = c(x,y) + R 2 (^,l): 
h G argmin I c (k) = 



argmin 

ker((h n )x,{h n )Y) 
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Proof. Let Q n = Q n {xo,yo) and consider the blow-up process as being 
done in two steps: restriction (h' n := h\q n ) and dilation (h n := h' n o(p n ). 
In the restriction stage h is restricted to the rectangular neighbourhood 
Q n , hence by Lemma [5T21 h' n is optimal: 



he argmin / c(x, y)k(x, y) =>- 
h' n G argmin / c{x,y)k{x,y). 



In the dilation stage h' n is composed with the linear map ip n : Q — > 
Qn ■ 0,2/) i-> (~z + x , + y )- Note that detDip n (x,y) = By 
the change of variables formula, 

c(x,y)h' n (x,y) = / c(x,y)(h n o(p- r )(x,y) 

c(Lf n (x, y))h n (x, y)\deW(p n (x, y) \ 

1 f x y 

— I c(x + -,y + -)h n (x,y), 
n Jq n n 



and so, 



argmin / c(x,y)k(x,y) 

1 f x y 

argmin c{x + -, y Q + -)k(x, y) 

h n ) x ,(h n ) Y )hn n Jq n n 

1 / ^ £ y 

argmin _ / {c{x,y) + R 2 {-, -)}k(x,y) 

k&{{h n ) x ,{h n ) Y ) h ^ n J Q U 71 



= argmin _ / c n (x,y)k(x,y). 

ker((h n ) x ,(h n ) Y )hn Jq 

For the second equality above, note that those terms of the Taylor 
expansion ffl3l) which are constant, are functions of x alone, or are 
functions of y alone give the same value when integrated against any 
density k in T((h n )x, (^n)y) since the marginals are fixed. Hence for 
the variational problem at hand only the mixed 2 nd -order terms in the 
Taylor series, namely c(x,y), and the remainder, i?2(-> ~), matter. For 
the last equality above, recall that argmin J ■ = argmin^- J ■ for any 
positive constant m. 

□ 
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6. Is OPTIMALITY INHERITED BY BLOW-UPS? 

It is natural to ask whether the blow-up of an optimal h is also 
optimal (among densities which share its marginals and which are dom- 
inated by the blow-up of h). For our purposes we do not need to 
have a complete answer to this question. Instead, we derive a neces- 
sary condition for to be (almost) optimal. In section [7] we show 
this condition is satisfied when h is optimal. 

Lemma 6.1. Let the cost c(x,y) satisfy conditions (C2) — (C3) of 
subsection \2.1[ Let < h G L°°(IR d x M. d ) have compact support and 
suppose that r(/i, v) h ^ 0. Take h G T(f,g) h and let (x ,y ) G M. d x 
M d \ Z be a common Lebesgue point of h and h. Let hoo, hoo G L 1 {Q) 
be the blow-ups of h,h at (x ,y ). If < h(x ,y ) < h(x ,y ) then 
hoo G r(/ 00 , goo) hoc can be improved: for any 5 which satisfies < 5 < 
min(h(x ,y ),h(x ,yo)-h(x ,yo)) there exists h 5 ^ G T^f^^g^) 11 ™ such 
that i"c(^o) < h{hoo)- Furthermore, h(x , y ) — 5 < h 6 ^ < h(x , y ) + 5 
on Q. 

Proof. Suppose that < h(x ,y ) < h(x ,y ) at (x ,y ). By Corol- 
lary [Ol hoo is equal to the constant function r = h(x ,yo) almost 
everywhere on Q. Its marginals f^ = (hoo)x and g^ = {h^y are 
both equal to r almost everywhere on [— ~, Also by Corollary 14. 3[ 
h^ is equal to the constant function R = h(x ,y ) almost everywhere 
on Q. By our assumption < r < R. 

We next recall a standard perturbation argument (e.g. [GM961 proof 
of Theorem 2.3]) to show that r is not optimal among densities k G 
T(r, r) constrained by R, where optimality is measured against c(x, y) = 
x ■ y. Pick two points (xi,yi) and (£2,2/2) * n Q sucn ^ na ^ + 
c(x2,y2) < c(x\,y2) + c(x2,yi). Since c(x,y) is continuous, there exist 
(compact) neighbourhoods Uj C [— |, \} d of and Vj C [— |, |] d of yj 
such that c(tii,t>i) + c(u2,v 2 ) < c(ui,v 2 ) + c(w 2 , t>i) whenever Uj G C/j 
and G V,-. It follows that U x n U 2 ^ and 14 n V 2 ^ 0- Take 
< 5 < min(r, R — r) and consider the density Ah which is equal 5 on 
Ui x Vi, ?7 2 x V 2 , is —5 on XJ\ x V2, £7 2 x V\ and is everywhere else. 
Note that hoo = r and h^ := r + A/i have the same marginals, and 
that < r — 6 < <r + 6 < Rby choice of 5, so h 5 ^ G r(/ 0O , goof" 00 - 
By the choice of the points (xi, yi) and (x 2 , 2/2), ^ = r + Ah has lower 
cost than /i M = r: ic(tyL) < h(hoo)- □ 

Definition 6.2. (Geometrically Extreme.) Let h be bounded. A density 
h in T(f,g) h will be called geometrically extreme if there exists a (C 2d - 
measurable) set W C M M such that h(x, y) = h(x, y)l w (x, y) for almost 
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every (x,y) G M. 2d . Here lw is the characteristic function of the set 
W. " 

Corollary 6.3. (A necessary condition for optimality of h^.) Let the 
cost c(x,y) satisfy conditions (CI) — (C3) of subsection \2.1\ Let < 
h G L°°(M. d x M. d ) have compact support and assume that T(f,g) h ^ 0. 
Take h G T(f,g) h . If is c-optimal at almost every (x ,yo), i.e. 
/loo G argmin keT ,f g yi^Icik), then h is geometrically extreme. 

Proof. Let N := {(x, y) G l d x l d \ Z | (x,y) is a common Lebesgue 
point of h and h}, where Z is the Lebesgue negligible set of subsec- 
tion 12.11 Recall that almost every point in IR d x is in N. Being 
c-optimal, cannot be improved. Hence by Lemma [67TI h(xo,yo) is 
either equal to or equal to h(xo,yo) at each point (x ,2/o) G N. In 
other words, h is geometrically extreme. □ 

7. Optimality implies being geometrically extreme 

The following lemma will be used in the proof of Theorem 17.21 Given 
two not necessarily positive marginal densities /, g G L 1 with the same 
total mass J f = f g, we would like to produce a joint density h which 
is controlled by / and g. Since / and g are not necessarily positive, it 
is possible for their total mass to be zero even when the densities them- 
selves are not identically zero. In such the product f(x)g(y) does 
not necessarily have / and g as its marginals. The following lemma ad- 
dresses this issue. 

Let 4>[Z] := f z 4>(z)dz denote the total mass of the function (f> on the 
set Z. 

Lemma 7.1. Let X,Y be Borel subsets of [— |, whose C d -measure 
is strictly positive. Let f G L 1 (X), g G L 1 (Y) have same total mass 
m := f[X] = g[Y] G R. Suppose 1 1/| < e - Then there 
exists a joint density h G L 1 (X x Y) with marginals f and g such that 
\\h\\L°°(XxY) < 3e(^rpq + Tjdryj)- 

Proof. Let f := G L 1 (X) and g := G L 1 (Y). Note that 

/o and go have total mass 1. We first deal with the case m — 0. Note 
that (/ • g ) x = f, while (/ • g ) Y = 0. Similarly, (/ • g) x = 0, while 
(/o • Q)y = g- Let h := / • g + fo • g. Since the maps (-) x and (-)y are 
linear, we get that hx = f, and hy = g- 

More generally, suppose the total mass m = f[X] = g[Y] is not 
necessarily 0. Let h := / • g + fo • g - mfo • go = ( / - mfo) • go + fo • 
(g — mg ) + mfo ■ go- Since the total mass of / — mf and g — mg 
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is 0, we conclude by above that hx = (/ — rnfo) + + mf = f, and 
hy = + (g — mg ) + mg = g. For (x,y) E X x Y the density h 
satisfies: 

\h(x,y)\ < \(f-mf )(x)\\go(y)\ + \fo(x)\\(g-mg )(y)\ 
+H\Mx)\\g {y)\ 

< (\f(x)\ + \m\\f (x)\)\g (y)\ 
+(\g(y)\ + \m\\g (y)\)\f (x)\ 
+H\f (x)\\g (y)\ 

< 7^77(1/(^)1 



c d [Y\ KUK 71 c d [xy 

1 /, / s, \m\ , |m 



< 



c d [xy li,Kyn c d [Yy c d [x]c d [Y] 

2e 2e e 



£ d [X] £ d [X] 



1 1 

The penultimate inequality above uses that \m\ = | f x f{x)dx\ < 
S x \f(x)\dx < eC d [X], and \m\ = \ j Y g(y)dy\ < j Y \g(y)\dy < eC d [Y]. 

□ 

Theorem 7.2. Let the cost c(x, y) satisfy conditions (CI) — (C3) of 
subsection \2.1\ Let < h E L°°(]R £i x IR d ) have compact support and 
take < f,g E Ll(R d x R d ) such that T(f,g)' K ^ 0. If h E T(f,g) K is 
optimal, i.e. h E argmin ker ^ g y^I c (k) , then h is geometrically extreme. 

Proof. Let N := {(x,y) E M. d x ~R d \ Z | (x,y) is a common Lebesgue 
point of h and h} where Z be the Lebesgue negligible set of subsec- 
tion 12.11 Note that almost every point in M d x M. d is in N. 

Fix (xo, yo) E N and let h n and h n be the blow-up sequences of h and 
h at (x ,j/o), with and /i^ their respective limits in L 1 . Suppose 
by contradiction that < h(xo,yo) < h(xo,yo)- 

Let R := ||^||l°°(Qi), R '■= h(xo, yo), and r := h(xo, yo). By Lemma IfTTl 
for < 5 < min(r, R — r), there exists h 5 ^ E T(f OQ , (? o) /l °° such that 
r — 5 < h 5 ^ < r + 5 and 

(15) Hht) < 1^). 

Assume for now (argued below) that there exists a sequence of non- 
negative densities h 5 n E L 1 (Q) (n E N , where the index set N is 
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the set of natural numbers defined in Propositions l4.4H4.5j) . with the 
following properties for large enough n: 

(PI) h 5 n < h n on Z^ ood , where Z^ ood C Q is a rectangular set satisfy- 
ing £ 2d [Z9° od ] -> 1 as n -> oo, 
(P2) h s n \^3ood and h n \^ g0 od have the same marginals, 

(P3) h 5 n is bounded by a constant independent of n on Q\Z^ ood , 

(P4) /c„M— > as n oo." 

By Lemma 15.21 constrained optimality is inherited by restriction to 
rectangular sets. Hence since, by Proposition 15.44 h n is c n -optimal 
among all densities which share its marginals and which are dominated 
by h n , its restriction, h n \^g 00 d, remains c^-optimal among all densities 

which share its marginals and which are dominated by h n . In particu- 
lar, by (P1)-(P2), h n {h 5 n \^ood) > h n (h n \^good). Hence, 

Icn (^n) = ( Ki I Zg ood ) + Ic n (hn\ Q\Zg ood ) 

— h n ( h n | zgood ) + h n (h s n \ Q\zg ood ) 
(16) = h{h n \^good) + Iz{h 6 n \ Osgood) + o(l) 

where we have used equation ffl4"j) to go from the second line to the 
third. 

Note that since \x ■ y\ < d/4 on Q, for any e -^ 1 (<5): < 
Jg |x • 2/| y) < |A;[Q]. Hence, rearranging equation ( TT6|) we get 

4(^n) - h n {Kl) + - ^c(^n|g\^9°°d) - 4(^|g^90 d) 

— \Ic(h n \ Q ^good) \ + |Jc(/i*|g^socd)| 

< j(^[g\^r d ]+^[Q\^r d ]) 

< ^(R+ { -^^)c 2d [Q\zr% 

where the last inequality above follows from (c) of Proposition 14.41 and 
property (P3). Letting n — >■ oo above, and using properties (PI) and 
(P4) as well as the continuity of the linear functional hi'), we get 
that hih 5 ^) > I^hoo), contradicting equation (IT51) . Hence for every 
(xo,yo) e N either = h(x ,y ) or h(x ,y ) = h(x ,y ). In other 
words, h is geometrically extreme. 

In the rest of this proof we demonstrate the existence of a sequence 
h 6 n with properties (P1)-(P4). We do this in several steps. For ease of 
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reference, we record the following chain of inequalities when < 5 < 
min(r, R — r): 

n r ~ 6 x / us / , x R + r + 5 3R + r + 5 

< <r-<f</£,<r + <5< < < R. 

Recall we are supposing by contradiction that < h(x , y ) < h(xo, yo). 

Step (1): Construction of densities h s n . Let := (h n )x, g n '■= (^n)y, 
foo '■= (^oo)x , and := (h^y- Note that /„ and g n have same total 
mass, and that foo and g^ have same total mass. Since f n and foo may 
not have the same total mass, we will work with normalized copies 

fn := and 9 'n := feBW" lt follows from remark S3] that 

h n [Q) — > /ioo[Q] > 0, hence and ^ are well-defined, at least for 
large enough n which is all we will use. Note that for large enough n, 
f'nid'm foo and g^ all have the same total mass. Since h is bounded and 
of compact support, so is h, hence so are fn,g n ,fn^9n e ^ X ([ — §> 
as well as /oo,#oo e ^([-5, 

Let S n : [— |, — > [—5, |] d be the unique measure preserving map 
between foo and (see |GM95] ) minimizing the cost c(x,y) = x ■ y. 
Similarly let T n : [— |, |] d — > [— |, |] d be the unique measure preserving 
map between goo and g' n minimizing c(x, y) . Note that S n and T n are 
essentially bijections (see |GM96j ). 

Recall (e.g. |GM95j ) that a measure preserving map s between two 
L 1 -functions / and g is a Borel map which satisfies the change of vari- 
ables formula 

(17) / h(y)g(y)dy= / h(s(x))f(x)dx, 

for all h continuous on M. d . Given / e L x (IR d ) and a Borel map s : 
M. d — > M. d , there is a unique function g e L 1 (M d ) satisfying equation 
f fT7|) . Call this g the push forward of / by s, denoted s#f. Note 
that s is measure preserving between / and s#/. Whenever s is a 
diffeomorphism, equation (|T7|) implies 

(18) g{s{x))\detDs(x)\ = f(x). 

From |M97j , if s fails to be a diffeomorphism but is given by the gradient 
of a convex function, equation (JT8"j) continues to hold f-a.e. 

Recall ||B91[ IM95J that for the cost c(x, y) = x ■ y, the optimal maps 
S n (x) and T n (y) have the form x i-> V^x) and y h-> V(j)(y), where ?/> 
and are convex functions. By Alexandrov's Theorem a convex func- 
tion has second order derivatives almost everywhere. Hence it makes 
sense to talk about the derivatives DS n and DT n almost everywhere. 
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We note that (S n x T n ) # k G T(f' n ,g' n ) for any k G T(f 00 ,g 00 ). It 
is straightforward to see this: we check that ((S n x T n )#k) x = f' n 
(checking ((5^ x T n ) # /c)y = is similar). For any ft G C([— |, 

h(x)((S n x T n )#k) x {x) = J J h(x){{S n x T n ) # k)(x,y) 

h(S n (x))k(x,y) = / /i(5 , n (x))/ 0O (x) = / h(x)f n {x). 




Let % := x T n ) # /i^. By the above, ft* G r(/ n ,£ n ) for all 

n, that is ft,* has the same marginals as ft n . 

Step (2): We next show that ft*, where n G iV , satisfies property 
(PI)- 

Recall the notation of Proposition H75J Denoting X n := S~ 1 (X% ood ) 
(respectively Y 9 °° d := T~ 1 (X% ood )), we have that £ d [Xf° d ] -> 1 (re- 
spectively that £ d [Ff ° d ] -»■ 1). 

By (b) of Proposition 14.51 / n — > h(xo,yo) = t 'uniformly' on X^ ood . 
By (c) of Proposition 14. 5[ /„ < ||ft||L°°(Qi) = -R on [— |, Since S n is 
a convex gradient, equation (ITS!) applies to give, 

1 v1 



|rfet(DS' n )(x)| /^(x) h n [Q] r 

'uniformly' on X^; while on [-§, §]', ^fe^ < f^f < *±i for 
large enough n. 

Similarly, by Proposition I4.5I and equation (ITHjl . 

1 = g'n( T n(y)) = ftoo[Q] 9n(T n (y)) ^ 

|det(DT n )(y)| " goo (y) " ft n [Q] r 

'uniformly' on Y 9 n °° d ; while on [-1, i]', < J^f < *±I for 

large enough n. 
Hence, 



\det(D(S n xT n ))(x,y)\ \det(DS n )(x)\\det(DT n )(y)\ 

■ r 15 T^good -^rrgood -rygood _ _ i 

uniformly on Z n := X n x K n ; while on Q, \ det{D{SnXTn)) \ < 

("~") 2 f° r large enough n. 

Note that the optimal map from (/ooS'oo) to (fn,g' n ) is given by 
(x, y) — )■ (^(x), T n (y)) = V(i[)(x) + <j>(y)), a gradient of a convex 



22 JONATHAN KORMAN AND ROBERT J. MCCANN 

function. So equation (TPS]) applies to give, h s n ((S n x T n )(x, y^ ho °^ 



It follows that for large enough n, 



\det(D(S„xT n ))(x,y)\ ' 

(19) < T -— < h d n ((S n x T n )(x,y)) < " ' ' ' " < R, 



r-S ~ s mW NN R + r + 5 



for almost every (a;, y) G Z 9 °° d ] while on Q and for large enough n, 
(20) « < (l±i)»fl < S±i£. 



Recall that by (b) of Corollary 14.31 h n — >■ i? uniformly on Q. It fol- 
lows, using equation ( IT9l . that for large enough n, h s n \^ 9 ood < h n \^ 9 ood. 

Step (3): Note that even though h 5 n and h n have the same marginals 
on Q, the marginals of ht\jgood and h n \ygood may not be the same. In 

step (4) fo£ will be perturbed by a density /i* so that {h 5 n + h^^good 
and h n \ygood have the same marginals. The perturbation will be chosen 

to preserve the capacity bound on Z^ ood : (h 5 n + h 5 n )\^. g ood < h n \^ a00 d. In 
this step we construct h 5 n . 

Let f n = ((h n -h 5 n )\z r d) x G L\XJ° d ) andfl£ = ((/^-^I^Ok G 
L 1 (y ? f ood ) be the marginals of (/i n — h 5 n )\^ 9 ood. Since /i n and h 5 n have the 

same marginals on Q, feg 00 d(h n -h s n )(x,y)dy + Jy bad (h n -h s n )(x,y)dy = 
0. 

Hence, by (c) of Proposition 14.41 and equation (1201) . 

= l/ (h n -h 5 n )(x,y)dy\ = \ f (h n -h s n )(x,y)dy\ 

Jyaood J Y t ad 

< I \h n -h 5 n \(x,y)dy< I (\h n \ + \h 5 n \)(x,y)dy 

J Y^ad JY^ad 

< (R+ <K ' R+ r2 l)Z )C d [Y^ d }. 

Similarly \g 5 n {y)\ < (R + ^±^) C d [X b n ad }. It follows from Lemma EH 
that there exist a joint density h s n G Ai(X^ ood x Y^ ood ) with marginals 

and such that 
(21) 
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Since the right hand side of equation ( 12T1) tends to as n — > oo, by 
choosing n large enough, we can make sure the densities h s n are as close 
to as we like. In particular, for large enough n, 

(22) \Ch 5 n Kx,y)\<min{ R - {r A +6 \ r -^}. 

Step (4): Establishing properties (P1)-(P4) for the densities h 5 n . 

Let h s n := h 5 n + h s n . Note that although h 5 n could be negative, h 5 n 
is non-negative: from equations ffl9|) and Q22j) we have that ft* = 
h s n + h s n > ^ - ^ = > 0. Since the marginals of h s n are 

/n = ((^n - h s n )\z a n °° d )\x and fl'n = ((K ~ K) \z 9 n ood ) lv, ^ and fr n have 
the same marginals on Z^ ood . This establishes property (P2). 

By (b) of Remark 14.31 /i n — >■ R uniformly on Q. Hence, since 
R > 3R + r+(5 , for large enough n: h n > 3R + r+l5 on Q. On the other 

hand by equations ([HI and (T2"2l . for large enough n, h 5 n = h s n + h s n < 
+ fl-(r+^) = 3R±r±s < ^ Qn ^good Thig establishes pr0 perty 

(PI). 

Since the perturbation h s n is supported on Z^ ood , h s n = h 5 n on Q\Z^ ood . 

Hence, using equation (120]) . h s n = h s n < on Q \ od . This es- 

tablished property (P3). 

To establish property (P4) we need to show that Ic n (h^) = h n {h 5 n + 
h s n ) — > hih 5 ^) as n — > oo. Note that by equation f[2T]) ft£ — >■ uniformly 
on Q. Hence by equation (JT3J) and Lebesgue's Dominated Convergence 
Theorem, h n {h & n) = kih^) + o(l) — > as n — > oo. So we need only 
show Ic n (h s n ) hih 5 ^) as n — > oo. 

Stability of the transport map [V09, Corollary 5.23] implies that 
S n ,T n converge in measure to — id\y_ i i^, minus the identity map 

on [— |, |] d . By extracting a subsequence if necessary we can as- 
sume [Ro68[ Proposition 4.17] that S n ,T n converge to — id\^_i^ d al- 
most everywhere on [— |, |] d . Since c(-, •) is continuous, it follows 
that c(S n (x),T n (y)) converges to c(x, y) almost everywhere on Q = 

\—l l] d * I — i il d 

L 2 ' 2 J L 2 ' 2 J " 

Note that \c(S n (x),T n (y))hl a (x,y)\ is bounded above on Q, e.g. by 
||c||l°°(q)-R- Hence, since C 2d [Q] < oo, we can apply the Dominated 
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Convergence Theorem to concluded that as n —¥ oo: 



hM) = [ 

JQ 




c(x,y)h s n (x,y) + o{\) 



[ c{S n (x),T n (y))h 6 00 (x,y)+o(l) 
Jo 



/ c(x,y)hUx,y)=h(hi c ). 



This established property (P4) and completes the proof. 



□ 



8. Optimal solution to the constrained problem is unique 

We now show that, given a capacity constraint h, the correspond- 
ing constrained optimization problem has a unique solution. In the 
unconstrained optimization setup, a characteristic property of optimal 
solutions is c-cyclical monotonicity. This property can be used to prove 
a solution is unique [GM96, Theorem 3.7]. The property of optimal so- 
lutions in the constrained setup that is used here to prove uniqueness 
is that of being geometrically extreme (see Definition 16 .2 j) . Note that 
in the unconstrained case, c-cyclical monotonicity is in fact necessary 
and sufficient for optimality, whereas in the constrained case geometric 
extremality is merely necessary. 

Theorem 8.1. (Uniqueness) Let the cost c(x,y) satisfy conditions 
(CI) - (C3) of subsection\2J\ Let the capacity bound < h G L°° (R d x 
M. d ) have compact support. Take < /, g G Ll(M. d x M. d ) such that 
r(/, g) h 7^ 0- Then an optimal solution to the constrained problem (TJ|) 
is unique (as an element of L l (M. d x M. d )). 

Proof. Suppose hi, hi are two optimal plans: h\, hi G argmin/ c (fc). We 

show h\ = hi almost everywhere. Since r(/, g) h is convex, |/ti + |/i 2 G 
T(f,g) h . Since I c (-) is linear, the plan ^hi + \hi is also optimal. 

Hence, by Theorem 17.21 hi,hi,\h\ + \hi are all geometrically ex- 
treme. In particular, hi = hly/i almost everywhere on R d x R d for 
% = 1,2. Let A := (W x \ W 2 )U (W 2 \ be the symmetric dif- 
ference of the sets W\ and W 2 - Either C 2d (A) = 0, in which case 
hi = h 2 almost everywhere, or else for almost every (x, y) G A, 
< (\hi + |/i 2 )(x, y) < h(x,y), contradicting Theorem 17.21 □ 



OPTIMAL TRANSPORTATION WITH CAPACITY CONSTRAINTS 



25 



9. Appendix: Duality and Examples 

In this appendix we sketch how the analog of Kantorovich duality 
[K42J would look for the constrained problem, following the minimax 
heuristics in [AGllJ [MG10J. One of the virtues of such a duality is that 
it makes it easy to check whether a conjectured optimizer is actually 
optimal. Defering the elaboration of a full duality theory to a future 
manuscript [KM12J, below we develop just enough theory to confirm 
the claims made in example 11.11 

Suppose / and g have total mass 1 on R d and recall the Duality 
Theorem from linear programming (e.g. [V03J). In the unconstrained 
context the primal problem is and the dual problem is 



(23) sup - / u(x)f(x)dx - / v(y)g(y)dy, 

(u,v)£Lip c JW. d JR d 

where Lip c := {(u,v) G L x (lR d ) x L 1 (R d ) \ c(x , y) + u{x) + v(y) > 
for all (x,y) G R d x R d }. We now formulate a dual problem in the 
constrained context. For the primal problem Q we consider the fol- 
lowing dual problem 
(24) 



sup_-/ u(x)f(x)dx- v(y)g(y)dy+ w(x,y)h(x,y)dxdy 1 

(u,v,w)£Lip c JR d JM. d JR d xR d 



where Lip c := {{u,v,w) G L\R d ) x L x (IR d ) x L x (M d x R d ) \ c(x,y) + 
u(x) + v(y) — w(x, y) > and w(x, y) < for all (x, y) e R d x R d }. It 
follows from the definition of (u, v, w) e Lip c , by integrating c(x, y) > 
—u(x) — v(y) + w(x,y) against h G T(f,g) h , that 

c(x,y)h(x,y) > / {-u(x) - v(y) + w(x,y)}h(x,y) > 

;K d JR d xR d 



u(x)f(x) - I v(y)g(y)+ I w(x,y)h(x,y). 
Hence when 

(25) / ch = - I uf - I vg + I wh 



we conclude that h G r(/, g) h is a minimizer of ()3]) and (u, v, w) G Lip c 
a maximizer of (E 



We now discuss example 11.11 where c(x,y) = ||x — y\ 2 , f 
l|r_i ii, and h = 2 • lLi ip (figure IB). 



2 ' 21 L 2 ' 2 
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Let u(x) := — \x 2 and v(y) := — \y 2 - Let S := {(x, y) G IR 2 | c(x, y) 



u{x) + v(y) < 0} = {(x, y) eR 2 \xy > 0}. Note that 5 n [-§, |] 2 
[-|, 0] x [-i, 0] U [0, i] x [0, |]. Now let h := fc| S n[-^ e r (/> # (see 
figure 1A) and let 

. , j c(x,y) + u(x) + v (y) on S 
w ^y)--={ onl 2 \5. 

Since w(x, y) < on R 2 , c(x, y) + u(x) + v(y) — w(x, y) is = on S, 
and is > on M 2 \ S 1 , (u, v, w) G i^ip c . Integrating w against h we get: 

c(x,y)h(x,y)+ / u{x)f(x) + / v(y)g(y) 
{c(x,y) +«(a;) + v(y)}/i(a;,y) = / w{x,y)h(x,y) 
w(x,y)h(x,y) = w{x,y)h(x,y). 

5n[-i,|] 2 JRxR 

That is, the given h,u,v, and w satisfy equation Hence /i mini- 

mizes the primal problem, and so is optimal, while (w, t> , w) maximizes 
the dual problem. 

References 

[AGll] L.A. Ambrosio and N. Gigli. A user's guide to optimal transport. Preprint. 
[B87] Y. Brenier. Decomposition polaire et rearrangement monotone des champs 

de vecteurs, C. R. Acad. Sci. Pans Ser. I Math., 305 (1987), 805-808. 
[B91] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued 

functions, Comm. Pure Appl. Math., 44 (1991), 375-417. 
[GM95] W. Gangbo and R. J. McCann. Optimal Maps in Monge's Mass Transport 

Problem, Comptes Rendus Academie des Sciences Paris, 321 (1995), Serie I, 

1653-1658. 

[GM96] W. Gangbo and R. J. McCann. The geometry of optimal transportation, 

Acta Math., 177 (1996), 113-161. 
[EG92] Lawrence C. Evans and Ronald F. Gariepy. Measure Theory and Fine 

Properties of Functions, Studies in Advanced Mathematics, CRC Press Inc., 

1992. 

[K42] L. Kantorovich. On the Translocation of Masses, C.R. (Doklady) Acad. Sci. 

URSS (N.S.) 37 (1942), 199-201. 
[KM12] J. Korman and R. McCann, work in progress. 

[LL01] E. H. Lieb and M. Loss. Analysis, 2nd edition, vol. 14 of Graduate Studies 
in Mathematics. American Mathematical Society, Providence, 2001. 

[M97] R. J. McCann. A convexity principle for interacting gases, Adv. Math., 128 
(1997), 153-179. 

[M95] R. J. McCann. Existence and uniqueness of monotone measure-preserving 
maps, Duke Math. J., 80 (1995), 309-323. 



OPTIMAL TRANSPORTATION WITH CAPACITY CONSTRAINTS 



27 



[M99] R. J. McCann. Exact solutions to the transportation problem on the line, 
Proc. R. Soc. Lond. Scr. A, 455 (1999), 1341-1380. 

[MPW10] R. J. McCann, Brendan Pass, and Micah Warren. Rectifiability of Op- 
timal Transportation Plans, 2010. To appear in Canad. J. Math. 

[MG10] R. J. McCann and Nestor Guillen. Five Lectures on Optimal Transporta- 
tion: Geometry, Regularity and Applications, 2010. To appear in Analysis and 
Geometry of Metric Measure Spaces, Lecture Notes of the 50th Scminaire de 
Mathcmatiqucs Superieure (SMS) Montreal, 2011. G. Dafni et al, eds. 

[Mo81] G. Monge. Memoire sur la theorie des deblais et de remblais. Histoire de 
I'Academie Roy ale des Sciences de Paris, avec les Memoires de Mathematique 
et de Physique pour la meme annee, pages 666-704, 1781. 

[Ro68] H. L. Royden. Real Analysis, 2nd edition, Collier-Macmillan Limited, Lon- 
don, 1968. 

[Ru87] W. Rudin. Real & Complex Analysis, 3rd edition, McGraw-Hill, 1987. 

[Sp80] M. Spivak. Calculus, 2nd edition, Publish or Perish Inc., 1980. 

[V03] C. Villani. Topics in Optimal Transportation, vol. 58 of Graduate Studies in 
Mathematics. American Mathematical Society, Providence, 2003. 

[V09] C. Villani. Optimal Transport, Old and New, vol. 334 of Grundlehren der 
Mathematischen Wissenschaften [Fundamental principles of Mathematical Sci- 
ences]. Springer, New York, 2009. 
E-mail address: jkorman@matli.toronto.edu 

E-mail address: mccann@math.toronto.edu 



Department of Mathematics, University of Toronto, Toronto On- 
tario M5S 2E4 Canada 



